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1 Introduction 

Let X = Xi, . . . , X„ be independent observations from a repeated experiment, and with 
common distribution function F. Let F„ be the empirical distribution and S(Xi,. . .,Xn) = 



1 



S{F„) be a statistic of the observations. The precision of S{F„) is a strictly decreasing function 
of n and the sample size is thus a crucial issue. 

It is often possible to increase the sample size by acquiring additional observations X' = 
X„+i, . . . , X„+k. This is done at additional cost and time, for example by increasing the cohorts 
in clinical trials or sequencing additional genes in molecular biology. In a parametric 
framework where f belongs to some family iFe)g^Q, S(Xi,. . .,X„) would typically be an 
estimator of 6 satisfying S{Fg) = and the precision, often of order n"^^^, should decrease 
by using X'. However the truth is often more complex. The use of additional observations 
raises at least two issues, which are addressed in this paper. The first one is the relevance 
of additional observations to the inference problem. If the additional observations X' do 
not share the distribution function F with X, it is certainly unwise to expect better precision 
when using them in the inference. We therefore need to assess whether X' is distributed 
consistently with F. Focusing on the average modification induced by extending the sample 
to X', we provide in Section |3] an approximation to the law of this modification, under the 
consistency hypothesis. This approximation is then fed in SectionlHto a test procedure and 
used to control the type I error. The second issue is the relevance of acquiring the data. If 
the common distribution F' of observations in X' is close to F, one additional observation 
only is likely not to be enough to detect the difference between F and F'. Indeed k needs to 
be larger than some function of n for the test to be powerful. In test language, for given F' 
and F, it is similar to finding the size sample needed to achieve a power exceeding some 
threshold. This issue can be solved using results of Section|3]and is addressed in SectionHl 
These two issues arise in a slightly different form in sequential tests of hypotheses and 
sequential change point detection. When collecting new observations is lengthy and costly, 
waiting for completion of a sample of size n before performing the analyses is not a option. 
In such an instance, it is desirable to use any new observation as soon as it becomes available. 
Wald's Sequential Probability Ratio Test (SRPT), introduced by his seminal paper (|Wald|. 



19451) and tightly connected to the classical Neyman-Pearson test for fixed sample size, does 



just this. Sequential tests stop sampling as soon as a positive result is detected and can thus 
be superior to classical tests by providing results faster than classical tests, as the success 
story of the Beta-Blocker Heart Attack Trial (BHAT) prove d in 1981 when it ended 8 months 
earlier than scheduled with positive results ( Study . E98lh . 



But, a lthough modifications exists to account for account for composite hypothesis (|Brodsky and Darkovsk 
2005h , sequential tests usually test Hq : F = Fq against Hi : F = Fi, i.e. observations are either 



all distributed according to Fq or all distributed according to Fi, which is different of our 
main concern, since new data can have a different distribution function than the previous 
ones. Sequential change point detection is closer in essence to our needs, although it does 
not perfectly fits our need either. 

Sequential change point detection is heavily used in statistical quality control. It is used to 
answer three questions: has a production process ran out of control, when did it ran out of 
control and what is the magnitude of the change ? Assume that the observations are dis- 
tributed according to Fq under the state of control and according to Fi under the other state. 
Noting r the point in time at which the jump is detected and v the point at which it occurs, 
most of the change point detection literature is interested in minimizing E[{T - v^], the 
average number of additional observations needed to detect the change. This is very close 
to our concern: new observations not being consistent with the previous ones is equivalent 
to a process running out of control at time n. The CUSUM (cumulative sum) charts use the 
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current obs e rvation to detect significant departures of the process from the state of control 



Pagd (|1954|) . iLail (|1995[) showed that a moving average scheme consisting of only a finite 



size observation window around the current observation is asymptotically as efficient as the 
CUSUM if the window size grows suitably fast to infinity. Brodsky and DarkovskyI ( 2000 ) 
generalize this result to a larger class of schemes. But all these methods are likelihood-based 
a nd assume Fn and Fi are simple enough for log-likelihood ratio to be easily computed. 



Benveniste et al.l (|1987l ) use weak convergence theory to extend CUSUM to non-likelihood- 



based procedures. Their asymptotic local approach use c onve rgence of the rescaled sums of 
detection statistics to a gaussian process. Lai and ShanI ( 1999 ) use another approach based 
on moderate deviations to extend a Generalized Likelihood Ratio (GLR) to non-likelihood- 
based detection statistics. We present in this paper an original non-likelihood based method 
to check the consistency of a new batch of observations with previous ones. Our method 
requires very little assumption about Fq and Fi and builds upon a simple and intuitive idea: 
under the hypothesis of consistency, the precision gain obtained when adding k observa- 
tions to the sample can roughly be estimated by the precision loss induced by removing k 
observations from the sample. 

Our work is motivated by the study of DNA sequences. Organisms genomes are sequenced 
gene by gene: when new genes become of interest for the community, they are simultane- 
ously sequenced in several organisms. Waiting for all genes from all species to be sequenced 
before proceeding to an analysis is of course not an option. The current standard is to use 
as many genes as available: concatenating several genes into one supergene increases the 
sample size - here the gene length - and implies a more accurate analysis. Such concatena- 
tion implicitly assumes that every new gene has the same evolutionary history as the others. 
Unfortunately, there is no certainty about that. It is well known that many mechanisms 



recombination, selective sweep, purifying or positive selection among others (Baldin g et al, 



20071) - lead different genes to have different histories. When a new gene becomes available. 



it should thus be tested for consistency before being included in the sample. If there is 
suspicion or exterior information that the new gene do not share a common history with 
the previous ones, the focus is on the minimum gene length necessary to confidently assess 
the difference, as in the optimization of the change point detection. 

The issue of change point detection is hardly new but unlike most methods available in 
the sequential tests literature the alternative hypothesis is not well specified: a gene can 
be affected by a number of evolutionary event and thus have a number of evolutionary 
histories. Specifying one, or even a finite set, of those histories in Hi is hardly better than an 
educated guess. The mai n focus is thus on rejecting Hg, close in philosophy t o the Repeated 
Significance Test (RST) (|Armitage et al.l. Il969l: lO'Brien and Flemingl. Il979l: IPocockl. Il977h . 
This particular issue of assessing consistency when the alternative is not well specified 
can also be found in the onlin e learning literature and is there referred to as concept drift 
([Domingos and HultenI, 120001) . 

The article is organized as follows: Section |2l introduces the key concepts and provides 
intuition about the kind of results we expect. Section |3] present our main results, derived 
from Edgeworth expansions, and discuss their strong and weak points. Section H] builds 
upon the results of Section |3] to present a test of consistency of a new set of data with 
previous ones. Proofs are postponed to Section |5l 
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2 Definitions and Notations 



2.1 Definition of A„ +fc and A„ 

Let (Xi, . . .,X„, . . .) be a sequence of i.i.d random variables whose common distribution 
function is Fq. Consider the sample mean for the first n terms: 



n 



1 " 

y„ = - y X,- 



and define: 



l^n-k ~ ^ n-k ~ ^ n- 

Since A„ +jt is invariant by translation of the X,s, we assume without loss of generality that 
the X|c are centered (E[Xi] = /.i = 0) and furthermore note: 

E[X2]=a2 E[X?] = K E[|Xip]=^3<oo 

A alternative definition of A„ +fc is 

\ ^ jr. n 

An,+;c = — —r / , X,i+; — - > X;. (1) 

n + A: -t— ( > n(n + k) ^ ' 

ls.n,+k (resp. A„^_/c) is centered with distribution function f+ (resp. F_) and variance cr^^^ 
(resp. c7^ J, ) where 

2 _ J 2 _ ^'^^ 

l^n,+k (resp. A,i,_/c) represent perturbations of the sample mean induced by adding (resp. 
removing) k units from the sample. As one would expect, when n increases perturbations 
to the sample mean are the same no matter whether k terms are added to or removed from 
the sample. To formalize this intuition, we focus on the difference f+ - f_. f+(x) - f-(x) 
is convenient for at least two results: using appropriate expansion techniques, we can get 
results about its order of magnitude and sup^.gj^ |f +(^) - F-{x)\, the quantity of interest in 
Kolmogorov-Smirnoff test, is easy to calculate given some expansion of f+(x) - F-{x). 



2.2 Characteristic Function 

But, before proceeding to derivation of the expansion, we recall a few properties of charac- 
teristic functions and use them to get insight into the difference between A„ +)t and A^-k- 
Let X be a real valued random variable with distribution function fx. Let fx be the 
characteristic function of X defined as fx{t) = E[e''^] = e'^^dFx{x). 

Hereafter and unless specified otherwise, we use the shorthands / for fx, f+ for /a,^ and 
/_ for /a,,.;,- Thanks to Eq. ([T]) and classical properties of the characteristic function for 
independent random variables, we have 
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Taylor expansion around yields 



/-(0-A(0 



kt^k 



where lower order terms have been omitted. Note that Fflr(A„^+)t) ~ Var{An-k) ~ 
Normalizing A„ +j; and A„ so that they have asymptotic variance 1 and considering the 
difference between the characteristic function of the normalized version yields 



/- 



nt 



yfki 



■A 



nt 



ke 



■Ol 



y/ki 



(3) 



omitting again all lower order terms. Since the first order term in the expansion of /_ - /+ 
around is of order k/n and although local expansion provides is not enough to prove it, 
we expect from the inversion theorem the difference F_ - F+ to be of order k/n. However, in 
order to achieve this result, two competing speeds need to be balanced: k~^^^ and k/n. An 
intuitive justification follows. It is clear that 



n 



-A 



n,+k 



O 



and 



n 



■A 



n,-k 



O 



1 ^ 



n-k+j 



-X„_,) (4) 



where X„ is the empirical mean of an n-sample of i.i.d. Xj. Since X„ = /,t + Op (n it is 
clear from Eq. (H) that -^A„^+k can be thought of as the standardized sum of k i.i.d roughly 

centered random variables with variance 1. If goes to infinity with n, the speed k~^^^ 
is thus the usual speed of the central limit theorem whereas k/n is the speed of the first 
order difference between variance of A„ +;c and A„-k. Depending on the regularity of F and 
the compared speed of k~^^^ and k/n, we can make the intuition rigorous and prove the 
assertion: 



^/kax 




Vkax 


-f_ 


n 

V J 




n 



y/2n 



n 



,n, 



(5) 



uniformly in x. Proper formulations and proofs are provided in Section |3l 
Eq. ^ provides an asymptotic expansion of /+ - /_ in an interval around and, although 
it gives some insight about the resulting Eq. (H), it is not powerful enough to derive it 
properly. We therefore resort to Edgeworth expansion, with an Edgeworth series acting as 
a middleman between /+ and /_. This is the aim of Section |3l 



3 Edgeworth Expansion 

Edgeworth series provide an approximation of a probability distribution in terms of its 
cumulants and are an improvement to the central limit theorem. The nice property of 
Edgeworth expansions is that they are true asymptotic expansions. We can thus control the 
error between a probability distribution and its Edgeworth expansion. The literature about 
Edgeworth expansion is quite abundant and full of powerful results. However most, if not 
aU, of these results rely heavily on / satisfying the so-called Cramer's Condition: 

lim sup 1/(01 < 1 (6) 
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Cram er's condition is equivalent to F having an absolutely continuous component (|Halll, 
19841) but we take a special interest in non-lattice completely discontinuous f {i.e. discrete 
X) for which condition ^ is not satisfied. We deal with distribution functions satisfying 
Cramer 's condition in Section l3Jl before turning to non-lattice discrete distribution functions 
in Section |3^ Proofs are postponed in Section |5l 



3.1 With Cramer's Condition 

The main result of this section is the following: 

Theorem 3.1 Let (X;) be a sequence ofi.i.d. real valued random variables with distribution function 
F. Suppose that Cramer's condition holds, i.e. that limsup|f|^^ < 1- Suppose furthermore 

that there exists an integer m >1 such E[|X|'"+^] < oo and consider a e {j^, l). Ifk ~ n" then: 



ox 



n 



-f_ 



.ox 



n 



x^^k Ik] 
V2^ n ^\nj 



(7) 



uniformly in x. 



If £[|X|"'] < oo for all m, as is the case for gaussian random variables, a can take any value 
in (0, 1). The only missing case isk = o{n^) for all e > 0. In particular and unlike gaussian 
variables, as will be shown in Prop. l^Tl A: can not be fixed or grow only logarithmically with 
n. 



3.2 Without Cramer's Condition 



The main result of this section is the following: 

Theorem 3.2 Let (X,) be a sequence ofi.i.d. real valued random variables with distribution function 
F. Suppose that X is anon lattice, discrete random variable. Suppose furthermore that [io, = E[|Xp] < 
oo and consider a e 1^. Ifk ~ n" then: 



ox 



n 



-f_ 



.ox 



n 



xe^k (k) 
V27I " \nj 



(8) 



uniformly in x. 



The fundamental difference between Theorems 13.21 and 13.11 lies in the range of value a can 
take. When the distribution function F of X has some absolutely continuous component, k 
is allowed, upon moment conditions, to grow slowly compared to n. When the distribution 
function is completely discrete, the third order moment is enough to achieve the expansion. 
Higher order moments, even if they do exist, are not sufficient to expand the range of value 
a can take and are thus not required. 
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3.3 New Generating Process 

The main result of this section is the following: 

Theorem 3.3 Let X, (resp. Yj) be a sequence ofi.i.d. real valued random variables with distribution 
function Fq (resp. f i). Suppose that X (resp. Y) has finite expectation [Iq (resp. [ii) and variance 
(resp. d\). Suppose furthermore that = E[|Yp] < oo and consider a G (0, 1). Ifk ~ n", then: 



n 



= O 



X ■ 



n + k 



+ 0{n-^) 



(9) 



uniformly in x, where jS = min(|, \- a). If x is restricted to a bounded range and fii [Iq, the 
correcting term n/{n + k) is unnecessary and Eq. ^ simplifies to 



f yfka^x] 



n 



X - 



V^(fii-fio)' 



+ 0{n-P). 



(10) 



Theorem 13.31 requires a third order condition on the new generating process Y to ensure 
that the remaining term is of order 0{k~^/^). Neglecting second order terms, behaves 

like a gaussian variable with mean V^ '"^^^^° and variance 1. As we could expect, the mean 
diverges faster if /.iq and /.ii are well separated when compared to the scale cri. 



3.4 About Discrete Distributions 

Our motivating example of DNA analysis is intimately linked to discrete state space. When 
comparing the same gene among a set of s organisms, each nucleotide in a species is 
associated to its homologous in the remaining species. An observation consists of a s-uple 
of nucleotides, . Each nucleotide can take value in the set {A, C, G, T} and thus the s-uples 
take value in {A, C, G, T^. The statistic of interest is the likelihood of an observation under 
a given model. The observations are intrinsically discrete and so is the likelihood of an 
observation under a given model. To turn these likelihoods to continuous variables and 
allow for the use of Theorem 13. 1 1 instead of the less powerful Theorem I3.2[ we must resort 
to the trick exposed hereafter. 

Formally, consider a discrete space A = (fl,)(=i,...,N and a probability measure = (Oi, 0^) 
on A. In DNA analysis, A = {A, C, G, and is a model assigning a probability to each 
a e A. Assume 0, > for all i and let {Zi)i^t<s be a sequence of i.i.d. random variables such 
that P(Z = aj) = dj for j = 1, . . . ,N. We take a special interest in {Xi)ien defined as 

N 

Xi = 10gP({Z,}) = Yj ^0§P{Zi = fl;)l(Z,=«,) 

7=1 

(X,) is easily an i.i.d sequence of discrete random variables such that P(X = log(0y)) = dj. In 
this case, we can prove thanks to Theorem 13.21 that supj^ |f + - f_| = -^^^ + o but only if 
k ~ n'^ with a G (2/3, 1). We don't have access to lower values of a. 

Suppose now that 6 is not the same for all Z, but rather that each Z; is drawn from A 
according to a specific = (af, a^^) and furthermore that a^'^ is an i.i.d sequence from 
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a Dirichlet distribution Dir(A0) that has density: 



n~ir(A0,) 



f{Vi,...,VN-l) = 



N-l 

n 

!=1 



for all Vi,..., Vm-1 > such that < 1 and = 1 - TjZi ^i- Intuitively, {Vi, V^) is 

a vector of the N dimensional unit simplex with mean 6 and variance inversely proportional 



to A: the marginal distribution of Vi has mean 0, and variance 



0,(1-0,) 

A+l 



Using Dir{Ad) instead 



of 6 can be seen as a regularization of the previous case, with 6 being the limiting case of 
Dir{Ad) when A goes to infinity 

It is then easily seen that the X, are i.i.d random variables taking value in R_ and absolutely 
continuous with respect to the Lebesgue-measure. A bit of algebra gives for all m 



E[\xn 



N N ^1 

i=i /■=i 



/=i 

N 



- I 

i=l 
< oo 



r(A0,)r(A(i - 00) 
r(A) 



\log"'{x)\x'^^il - xf^^-^'^-^dx 



In this case of particular interest. Theorem 13.11 applies for any value of a in (0, 1) as m can 
be taken arbitrary large. 



4 Application to Test 



Theorems 13.11 and 13.21 are useful for detecting changes in the generating process of new 
observations. 

We want to test whether the new batch of observations is generated by the same process 
as the previous observations. Formally, given two probability distributions f o and Fi, and 
a sequence of independent random variables (X,) with associated distribution function Fx,, 
we want to test Hq: "Fx, = Fq for z = 1, . . . , n + k" against Hi: "Fx, = Fq for i < n and Fx, = Fi 
otherwise". 

In our problem, the statistic of interest is the sample mean, calculated either on all n + 
observations (y„+ic)or only the previous n observations (Y,,). We shall therefore assume 
that Fo and Fi have different means /io and /Ji. An,+k = yn+k - Y„ represents the influence 
of the batch of k new observations on the mean, i.e the translation of the sample mean 
induced by adding the batch of new observation to the calculation. The use of the terrn . 



"influence" is not coincidental: A„^+k is strongly connected to influence functions (jHampel 



197i:lHubeil. 120041) . When the quantity to estimate is the mean /.t of a distribution and k = 1, 



nA„ +1 is indeed exactly the empirical influence value of observation X„+i on the estimator 
Y„ = ^ Z^Li X, of jU, i.e. the influence of an infinitesimal perturbation on /} along the direction 
6x,, the unit mass at point Xj. 

Large positive or negative influence values point up the corresponding observations as 
potentials outliers whereas small to moderate influence values support consistency of the 
data. Up to a rescaltng, A„ +;c can be understood as an extension of influence functions to a 
batch of observations instead of a single one. 
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4.1 Distribution of A„ +fc under Hq 

Let k G {n^\ nP^] with j6i and to be specified later. Under Hq, fx, = Fq ior i = 1, . . . ,n + k 
and it comes from Theorems 13. 1 1 for continuous and 13.21 for discrete distributions that A„ +fc 
and ls.n-k have the same distribution function, up to a correcting term of order k/n. For 
discrete distributions, {(^1,(^2) = i^f^ + e,l - e) where e is an arbitrary small positive value. 
For continuous distributions {(^1,(^2) = i-;^ + e, 1 - e) where e is again an arbitrary small 
positive value and m is the highest order moment of Fq. 

The alternative definition Eq. dU) of A„-k gives different weights to (Xi, . . . , X„-k) and 
(X„_A;+i, . . . , X„). Under Hq, the first n observations are identically distributed and exchange- 
able. Exchangeability implies that the order of (Xi,. . .,X„) does not matter. Since their 
order does not matter, (X„_/;+i, . . . , X„) can be replaced by any other subset of (Xi, . . . , X„) of 
size k. In particular, the distribution of A„ can be approximated by repeatedly selecting k 
terms from (Xi, . . . , X„) and substituting them to (Xn-k+i, ■ • ■ , X„). 

When the distribution Fq of the X, under Hq is not a simple parametric function or involves 
a large number of parameters, the exact distribution function of An,+k is unachievable. Even 
an Edgeworth expansion a la Prop. 15.91 requires the estimation of many cumulants. By 
contrast a good numerical approximation of F_ is available thanks to the previous remark 
and we can substitute it to F+. Adding the correcting term of order k/n only requires the 
estimation of the standard deviation a of Fq. And one may notice that since there are n + k 
observations with n larger than k, the estimation of o is significantly more accurate than the 
approximation of F_ by its empirical version. 

Wrapping up the preceding remarks, the distribution F+ of A„-k can approximated in the 
following way: 

(i) Compute the mean y„ of the n observations; 

(ii) Select at random without replacement k observations among the n; 

(iii) Compute the mean Y*_j, of the remaining n - k observations; 

(iv) Record the difference A* , = Y„ - Y* 

^ ' n,-k '' n-k' 

(v) Repeat (ii) to (iv) a large number (N) of times. 

The distribution F+ of A„^+k is then well a pproximated by the distribution of A* _j, , corrected 
by the term of order k/n (see Hall ( 19841) for more detailed results). The approximation of 



F+ can then be used to construct a critical region for rejecting Hq based on the A„^+k. 



4.2 Distribution of A„ +fc under Hi 

Under Hi, noting the variance of the distribution Fi and assuming /jq y-i, Theorem 13.31 
implies 



n 



where O is the standard normal distribution. The distribution of A„ +/; under Hi is approx- 
imately gaussian with mean Vfc ^'^^^^'° diverging to 00 with k. Difference between F+ and 
F_ is of order (9(1) and terms correcting for the lack of gaussianity of the observations are 
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negligible in front of the main term. Given the boundary of the rejection zone calculated in 
section 1411 the approximate power of the test can then easily be computed. 



4.3 Discussion of the results 



About the remainder term: Theorems 13. I| and l3.3l are derived for very general distribution 
functions: they hold under mere moment conditions. When the distribution at hand is 
better specified, more accurate results can reasonably be expected. But in the absence of 
any further assumptions, the remainder of order o{k/n) is possibly the best we can achieve. 
For example, if the distribution function is skewed, tedious calculations show that the 
remainder is at least of order 0{ 's/k/n). And we can get closer to k/n by mimicking discrete 
lattice distributions. Lattice distributions are off-limits but can be seen as the limiting case 
of non-lattice discrete distributions: a discrete non-lattice distribution with jumps of size 
1/2 - e at points ±1 and size e at points ± V2 is very close to a lattice distribution with jumps 
of size 1/2 at points ±1 for small enough e. For the limiting case of Fq being such a lattice 
distribution, and for odd such that neithern/fcnor {n-k)/k are integer, F+ has a jump of size 
of asymptotic size ^JTfnk at point l/(n -I- k) when f _ has no jump at that point. Since ^e"^^/^ 

has no jump whatsoever at any point, the extremum of (F+ ( Vfccrx /n) - F_ ( Vfccrx/n)) -kxl ne~ ^ 
is at least y/2/nk attained for x = -^-^ and thus of order at least /c"^''^. Since k~^l^ ~ n""^^ 
which can be arbitrarily close to k/n as a decreases towards 2/3, the o{k/n) can not be 
improved upon in this case. 

On the other hand, gaussian variables have such a nice distribution that most calculations 
about F_ and F+ can be done exactly. Most important of all, whatever the value of k, if 
(X„+i, . . . , X„+k) is a linear vector, then any linear combination of X„+i, X^+k is gaussian. 
Going back to Eq. H]), the first term is exactly gaussian and there is no need whatsoever for 
correcting terms of order k'^^. This is the most favorable case, for which the remainder in 
Theorem l3.1l has the smallest order of magnitude. 

Under Ho, if the X, have mean ^ and variance o^, then ^ ~ yV(0, ^^), ^ ~ yV(0, 
and we can derive the following result: 

Proposition 4.1 Let An,+k and An-k be defined as before, then: 



^/k, 



ox 



F_ 



^/k, 



ox 



n 



k xe 



n yfln 



Uniformly in x. 

Prop l4.ll is better than the result provided by Theorem l3.1l as OQc^/n'^) is smaller than o{k/n). 
Further algebra can even prove here that OQc" In^) is no greater than l.lk^/n^, uniformly in 

X. 

Under hypothesis Hi, we have: 

nAn,+k 



N 



o 



n + k 



yfki^i - [.Lo) ^ Ak 

+ — 

Oi n 



\ 



where A < 1 -I- -|. As expected, the result is again slightly more accurate than would be 

obtained by Theorem 13.31 alone, as the remainder is exactly, instead of at least, of order 
{k/nyi^. In the gaussian case, we can thus easily improve upon results from Section|3l 
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About Cramer's Condition: Cramer's condition plays a crucial role in the demonstration 
of Theorem 13.11 Without Cramer's condition, there is no guarantee that jumps of the 
distribution function F+ are of order o{k~^) and higher order moments of F+ can not be used 
to improve the range of k that can be used. Indeed, as the binomial example emphasizes for 
the forbidden but limiting case of lattice distribution, jumps can be of order k~^^^. But for 
non-lattice discrete lattice distributions, the maximum jump is at most of order o{k~^^^) and 
can be much smaller than that, for example o{k~^). In this case, is might be possible upon 
further work to increase the range of value a can take in Theorem 13.21 



5 Proofs 

Before we proceed to proof of Theorem 13. 1[ 13.21 and 13. 3[ we recall some lemma concerning 
the expansion of f^{x/ V^). 

Without loss of generality, we assume E[X] = 0. Note the variance of X, aj = £[X^] the 
moment of order j and Kj the /-th cumulant of X, defined as: 



' I'dt' ^ J 



f=0 



^(lno/)0)(0) 



5.1 Previous Results 

Lemma 5.1 (Esseen45) Let (X,) a sequence ofi.i.d. random variables and m > 3 an integer such 
that E[\X\"'] < oo, then 



fx 



t 



( m-2 



1 + 



L 

7=1 



ki'^ 



< ^{\tr + \tr'-'>)e-T for \t\<^, 

k"^ y 



where Pj{it) = T^l^-^ Cpiit)^^'^' is a polynomial of degree 3; in it, the coefficient Cp being a -polynomial 
in the cumulants K3, . . . , Kj-v+2, and 6{k) — > 0. 

Lemma 5.2 (Esseen45) Let (Xj) a sequence ofi.i.d. random variables and 2 <v <3 a real number 
such that jSv = E[|X|^] < 00, then there exists a constant Cv depending only on v such that 



fx 



e 2 



01 



Cv jSv,,,v _f£ , ,,, cr<'-2 ^/k 
< —^— tl^e * for \t\ < - 



Lemma 15^ 
in 



and 



5]2]are proved in Esseen ( 1945h (p. 44). An alternative proof can be found 



Crameil (|1937l) (p. 71 and 74). 



Lemma 5.3 (Esseen48) Let Xbe a non lattice discrete random variable, then for every r/ > there 
exists a positive function A{k) — > 00 such that: 



i \^/kl 



11 



The proof of Lemma |5^ can be found in lEsseen (|l945h (Lemma 1, p. 49). 
We recall one last theorem before proceeding to the proof. 

Theorem 5.4 (Essen48) Let A, T and e be arbitrary positive constants, F{x) a non-decreasing 
function, G{x) a real function of bounded variation on the real axis, f{t) and g{t) the corresponding 
Fourier-Stieltjed transforms such that: 

1. f (-oo) = G(-oo) = 0, f (oo) = G(oo) 

2. G'{x) exists everywhere and \G'{x)\ < A 



m-g{t) 
t 



dt = £ 



To every number k > 1, there corresponds a finite positive number c{k), only depending on k, such 
that 

\F{x) - G{x)\ <k^+ c{k)^ 
The proof of Theorem 15.41 is given in lEsseen (|l945h (Theorem 2. a, p. 32) 



5.2 New Results 

Lemma |531 is a generalization of Lemma \5Al 

Lemma 5.5 Suppose that X,- is a sequence ofi.i.d. random variables such £[1X1"'] < oofor an integer 
m>3, then for \t\ < f^: 



fx 



t n 1 
^lko^ + ^) 



e--\l + — 
\ n 



kt^\{^ '^'Pjiit)^ 



< 



[k 2 ^ } 



If- 

'n2 



where Pj{it) = T^i^i Cp{it)'^^'^i is a polynomial of degree 3; in it, the coefficient cp being a polynomial 
in the cumulants K3, . . . , TCj-v+3, limjt^oo 6{k) = and C„, and C'„^ are constants depending only on 
m. 

Proof. It follows from Lemma \5A\ that 



fx 



t n ] 
ylko^ + ^} 



ki'^ 



We now expand e ^ in power of ^ and arrange the terms in a convenient order. 



k^t^ 



n 2{n + ky 
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Furthermore ^ ~y (l + ^) - ~T' where the last inequality holds for large enough n. 
It then follows from a Taylor expansion that 



\ n , 



. _ <e~~ 



kH' 



We also have, for any integer / 



n 2{n + kY, 



n 



2 ■ 



And thus there exist a constant Kj, not depending on n and / such that 



Pj{it{l + -)) - Pjiit) 



n 



< K0r' + \tfi)- 
n 



It follows that there exists a positive constant Cm, depending neither on n nor k such that 



Pj{it{l + ^)) 



m-2 



;=1 I j=l 



m-2 



Finally e'^ (l + ^) < Sg-x and there exists a constant C;„ such that 1 + ^| < C;„(l + 
|i|3(m-2)-)_ Pqj. ^j^y f^^j. _ < _ |,)| ^ _ ^)|_ ug-j^g A = e-^1 + 

fl = B = 1 + L;-^ and & = 1 + Ejri' ^ we obtain: 



( m-2 



1 + 



+ ^)) 



n 



From which the result immediately follows. 



Lemma 5.6 Y<Jith the notations previously defined and under the conditions ofTheorem\3.1\ 



fx 



( - ^kt 

{n-k)o ^ 

^ - ^kt y 

(n + k)a 



X 



< K.{t' + t')^ 



< K4f + t')^^ 
n^ 



uniformly for \t\ < j^, where K+ and are constants not depending on n,kor X. 
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Proof. Since the two inequalities are proved in the same way, we prove only the first one. 
It is readily observed that fil^'" increases with m, thus (i^ < fiH"'. It follows by taking v = 3 in 
Lemma [5:21 that for \t\ < 



fx 



e 2 



< C 



J /J A 



A simple decomposition of the quantity to upper bound yields 



( -^Jkt \ 
{n - k)o 



n-k 



I n2, 



< 



fx 
expl 



yfkt 



, n-k 



exp 



(n - k)o ^ 

kf f\ 

'2{n-k)2j ^'"Pj 2n 



2{n - k) 
kf 



+ 



+ 



kf' 



(11) 
kt" 



For large enough n,k < n - k and thus for \t\ < -^-^ < 2-^^, the first term of the right-hand 



side of Eq. (|TT]l is upper bounded by 



fx 



{n - k)o 



n-k 



e n-k 2 



< C 



^3 k'" 
ff3 (n-fc)2' 



k^l\ 



where K2 = €3(^3/0^ sup Jn^/{n - k)^}. 

Using the classical inequality \e^'^y - < \y\e^ for y < we bound the second term of Eq.lTTl 

2 V?- 

where Ki = supn{n/{n-k)}/2. Finally we bound the third term of Eq.lTTIusing the inequality 
- (1 - x)| < x^/2 for x > 0: 



k k 


kt^ 


k 


k 


g H-k 2 — g 1! 2 


< e "2 










n-k 


n 



ktl / kt^^ 

e « 2 - 1 — 

n2, 



< 

~ n2 4 



Since k^^^/n^ = o{k^/n^), for K+ large enough 



F 1 F 1-2 



which ends the proof of the first part of the lemma. Replacing n - k hy n + k, the same 
demonstration holds and yields the second inequality of the lemma. ■ 



Lemma 5.7 With the notations previously defined and under the conditions ofTheorem[ 

( _ ^ ^^^^ II 

A: 2 w 



CT/' 



2n 



2n 



fc//2 



5W 

m-2 
: 2 

b{k) ^ Vfc 



< ^TiT + C,,,— (If + 

K 2 ?^ I 



+K;4e-^(|fp + 1^13-2) 
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uniformly for \t\ < where K'_ and K'^ are constants not depending on n and k. 



Proof. For any four reals A, B, a, h, \AB - ab\ < \B{A -a)\ + \a{B - b)\. We take A = fx {-;j^f^ 
a = (l + B = fx h = (l- i). Usmg \a\ < Q„e-^(1 + lipO-^)) ar^d 



Lemma |5. 61 



7 2 

\a{B - b)\ < Cn,K.{l + + t^-'T < r — (|f|2 + |f|3"'-2)e-f 



n2' 



,2|fP + |f|4 + |f|3'"-4 + l^|3m-2- 



where K'_ = C,nK. sup <^ e " i^i2 ^ if i3m-2 r Similarly using |B| < 1 and Lemma lSH 



\B{A - a)\ < ^{\tr + 



Combining these two inequalities gives the result for the first part of the lemma. The second 
part is proved in the same way using Lemma |531 instead of 15.11 ■ 



5.3 Proof of Prop n 



Lemma 5.8 Let 0„ (resp. O;,) be the cumulative distribution function of a centered normal random 
variable with variance a (resp. b). Furthermore assume there is e > such that a = {1 + e)~^ and 
b = (1 - e)"^. Then, for vanishing e: 



XP 2 / \ 

0„(x)-cD,(x) = e— +0 

V2^ ^ ^ 



uniformly in x. 



Proof. Since Ogiix) - 0(x/cr), we have 0„(x) = 0(x/ V^)- By hypothesis a~^'^ = (1 + e)^^^ 
1 + e/2 - + O {e^). A Taylor expansion around x gives 



\V«/ 2 6 

^ J + e^j + x3cD(3)(c)0(e3) 



j2 

where c belongs to {x,x/ -\/a). Since xO'(x) and x^O"(x) can each be written P{x)e~^ with P 
a polynomial of degree lower than 4, they are bounded on ]R. The same holds for x^O^^^(c) 
since |x^O^^*(c)| < sup^.gj^ \x^(t>'^^\x/ ^fa)\ < l.la^l^ < oo. We can therefore rewrite 



15 



uniformly in x. The same arguments lead to 



,2\ 



,0"(x)6-2 



.2 

Combining these two equations and using xO'(x) = ^^^^ gives the results. ■ 

Proof of Prop. E} Since, §ia\^^ = (l + ^)"' and ^c72 _^ = (l - |)"\ the result is a direct 
consequence of Lemma l5?8l when replacing e by ^. ■ 



5.4 Proof of Theorem D 

Proposition 5.9 With the notations and under the conditions of Theorem 



f_ 



n 



= 0(x) 



m-2 



2^/2nn 



e 2 + 



0(x) + 



kx 



2^/2nn 

Uniformly in x, where D is the differential operator. 



-e 2 + 



L 

m-2 

L 

7=1 



— V-7— 0(x) + - 



Proof. The two developments are obtained in the same way, we focus on the first one. It 
follows from Lemma |5^ that 



A 



17 



— , m-2 



/-te)--(i-s)(i+E;if¥) 



< 



{\tr + \tf^'"-^^)e-^ + {\tf + |fp"^-2))e-^ 

OO ^ J —CO 



Since Cramer's condition holds, sup|j|>^-i/m |/x(OI < c < 1. It follows then that 



l/m 



,,„-2, /fc^ 



(12) 



(13) 



The same holds for e (l ~ ^) (l + Ljli^ ^^)- Firi^lly/ combining Eq. ((12)) and Eq. (|T3^ 
gives 



J-j 



J.m/2 



dt = o\ — 



16 



Remark that k "'^^ ~ n "2" = o{n '"+2) = Using Theorem 15.41 with T = k we obtain: 



f_ 



^/kox) 



n 



m-2 



(1 + ¥)P,(-D) 



7=1 



0(x) + a - 



The term ^1 + (-D)O(x) of the right-hand side gives 0(x) - j-^^s "2 when doing the 
inverse Fourier transform. The result then follows from ^^p7^0(x) = uniformly in 
X. Replacing 1 + ^ with 1 - ^ in the proof gives the second expansion. 



Proof of Theorem 1X1} The result is a direct consequence from Prop. \53\ 



5.5 Proof of Theorem 112 



Remark: Cramer's condition is essential to ensure that the Edgeworth expansion of /+ is 
valid up to the order m. If it does not hold, then Lemma 15.11 and 15.51 are still valid but 
jT ]/(£)!_ (^Qgg j-^q|. decrease exponentially fast anymore. We are limited to T or order k^^-^ in 
TTieorem 15.41 so that only expansions of order 1 are available. But order 1 is not enough if n 
grows too fast compared to k. 

Proposition 5.10 With the notations and under the conditions ofTheorem \3.2\ 

2^/2nn \nj 




= 0(x) + 



kx 



2^/2nn 



_£ Pi(-D)^, ^ k^ 



Uniformly in x. 



Proof. As for Prop. 15.91 the result is an application of Theo. |5.4[ It follows from Lemma ISTl 
that 



A 



-L 



1/3 



T73 
'3 



4/3: 



dt 



< 



m 



v/ — C 



^ 2 Ir^ f^°° 2 

{\tf + \tf)e-T + K'_- {\t\' + \t\y-'-^ 
3 ^ J-00 



(14) 



Remark that, since a > 2/3, k ~ n "''^ = o{n" ^) = o(^). Since Cramer's condition does 
not hold, we resort to Lem. |53] from which it follows that 



dt < 



■a ^lkA(k) 



T/3 

'3 



■dt 



- u 



^* = or.'v„(i) (15) 
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And the same holds for e'^ (l - ^) (l + Combining Eq. dUl) and dH]) yields 



I 



dt = o\ — 

\nl 



Since A{k) oo ask, or equivalently n, goes to infinity, = ^^^) = o Theo. 15.41 

then implies: 



Vkox 



1 + 



2n . 



(-D)O(x) + 



(1 + ¥)P,(-D) 



0(x) + a - 



where D is the differential operator. The first term of the right-hand side gives 



kx 



e '2 . The result then follows from ^ 0(x) = uniformly in x. Replacing 1 + ^ 



with 1 - ^ in the proof gives the second expansion. ■ 
Proof of Theorem\3^ The result is a direct consequence from Prop. ISAOl 



5.6 Proof of Theorem |33 



Remark: As soon as X and Y have different expectations, A„^+k is not centered anymore 
and the central limit theorem is enough to get the first order expansion of its distribution 
function. Up to a normalization constant, An,+k drifts away to ±00, depending on the sign 
of fti - /.to. 

Following along the same lines as the proofs of Theorem 13.21 we first note that 



nt 



Using Lemma |5^ 



n V%i-fio). r 



n 



y/kt 



{n + k)oi 

And it comes from Lemma l5Al that 
n t 



1 



olkt^' 
loin. 



'\n + k VfccTi 



k^ 



< w + for \t\ < 



n 



n 



<^(tf + |t|V* for |t|<^^ 



1/3 



Using the trick \AB - ab\ < \A{B - b)\ + \b{A - a)\ with A = fx-,, B = fy.,, [j^,^] , 

fl = 1 - ^ and & = (1 + f ) + 0f§), it comes from \A\ < 1 and \b\ < K{\ + \tf)e-'^ 



that 



{n + k)oi 



n 



kf 



Y-f, 



'\n + k 



In 



a\ ky^ 



<Ki — + 
n 



2 
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for \t\ < B 



l(gO,(Tl) Vfc 



. It then follows that 



J-B 



f ( ^kt \" r (j, t_ 



dt = o\-\ + o{k-^'^). 



Lemma l53l combined to Theorem I5.4l then provides the following result: 



n 



n + k Ol 



[ yjkoi n + k Ol J 



^/ s '-i^ ^Q\xe '^e 2 as {l-x^)e 2 lk\ 
= (D(x) + -|2-4| ^ _J^+o|-| + o(r^/2) 



n 



oil 2^/2n 6(7^ 



71 



uniformly in x, where fi = mtn(|, 1 - a). In addition, if x is bounded by some M, we further 
have 

^^^kolx] ( V^(fii-fio)^ 



n 



X - 



C7l 



which concludes the proof. 
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